1 research outputs found
Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness
Adversarial examples are malicious inputs crafted to cause a model to
misclassify them. Their most common instantiation, "perturbation-based"
adversarial examples introduce changes to the input that leave its true label
unchanged, yet result in a different model prediction. Conversely,
"invariance-based" adversarial examples insert changes to the input that leave
the model's prediction unaffected despite the underlying input's label having
changed.
In this paper, we demonstrate that robustness to perturbation-based
adversarial examples is not only insufficient for general robustness, but
worse, it can also increase vulnerability of the model to invariance-based
adversarial examples. In addition to analytical constructions, we empirically
study vision classifiers with state-of-the-art robustness to perturbation-based
adversaries constrained by an norm. We mount attacks that exploit
excessive model invariance in directions relevant to the task, which are able
to find adversarial examples within the ball. In fact, we find that
classifiers trained to be -norm robust are more vulnerable to
invariance-based adversarial examples than their undefended counterparts.
Excessive invariance is not limited to models trained to be robust to
perturbation-based -norm adversaries. In fact, we argue that the term
adversarial example is used to capture a series of model limitations, some of
which may not have been discovered yet. Accordingly, we call for a set of
precise definitions that taxonomize and address each of these shortcomings in
learning.Comment: Accepted at the ICLR 2019 SafeML Worksho